Safe Reinforcement Learning via Shielding under Partial Observability

نویسندگان

چکیده

Safe exploration is a common problem in reinforcement learning (RL) that aims to prevent agents from making disastrous decisions while exploring their environment. A family of approaches this assume domain knowledge the form (partial) model environment decide upon safety an action. so-called shield forces RL agent select only safe actions. However, for adoption various applications, one must look beyond enforcing and also ensure applicability with good performance. We extend shields via tight integration state-of-the-art deep RL, provide extensive, empirical study challenging, sparse-reward environments under partial observability. show carefully integrated ensures can improve convergence rate final performance agents. furthermore be used bootstrap agents: they remain after initial shielded setting, allowing us disable potentially too conservative eventually.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Safe Reinforcement Learning via Shielding

Reinforcement learning algorithms discover policies that maximize reward, but do not necessarily guarantee safety during learning or execution phases. We introduce a new approach to learn optimal policies while enforcing properties expressed in temporal logic. To this end, given the temporal logic specification that is to be obeyed by the learning system, we propose to synthesize a reactive sys...

متن کامل

Manifold Embeddings for Model-Based Reinforcement Learning under Partial Observability

Interesting real-world datasets often exhibit nonlinear, noisy, continuous-valued states that are unexplorable, are poorly described by first principles, and are only partially observable. If partial observability can be overcome, these constraints suggest the use of model-based reinforcement learning. We experiment with manifold embeddings to reconstruct the observable state-space in the conte...

متن کامل

Learning Abduction under Partial Observability

Juba recently proposed a formulation of learning abductive reasoning from examples, in which both the relative plausibility of various explanations, as well as which explanations are valid, are learned directly from data. The main shortcoming of this formulation of the task is that it assumes access to full-information (i.e., fully specified) examples; relatedly, it offers no role for declarati...

متن کامل

Convergence Results for Reinforcement Learning with Partial Observability

In this report we propose two reinforcement learning algorithms developed to determine the optimal solution for Partially Observable Markov Decision Processes. The algorithms are analyzed and the corresponding convergence properties of both algorithms are established. Finally, both algorithms are tested in several examples from the literature and their performance is discussed. The lines for fu...

متن کامل

Deep Decentralized Multi-task Multi-Agent Reinforcement Learning under Partial Observability

Many real-world tasks involve multiple agents with partial observability and limited communication. Learning is challenging in these settings due to local viewpoints of agents, which perceive the world as non-stationary due to concurrentlyexploring teammates. Approaches that learn specialized policies for individual tasks face problems when applied to the real world: not only do agents have to ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Proceedings of the ... AAAI Conference on Artificial Intelligence

سال: 2023

ISSN: ['2159-5399', '2374-3468']

DOI: https://doi.org/10.1609/aaai.v37i12.26723